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Abstract. Given an n x n complex matrix A, let 

r-5 MA(a;,2/) := -|{1 < i < n.RcXi < a;,ImAi < y}\ 

3^ n 

p^ be the empirical spectral distribution (ESD) of its eigenvalues Ai <E 

C.i — 1, . . . n. 
f^ ^ We consider the limiting distribution (both in probability and in 

,^ the almost sure convergence sense) of the normalized ESD fi 1 ,^ 

^^ of a random matrix j4„ = {aij)i<ij<n where the random variables 

fvj dij — E(ajj) are iid copies of a fixed random variable x with unit 

variance. We prove a universality principle for such ensembles, 
namely that the limit distribution in question is independent of 
the actual choice of x. In particular, in order to compute this 
distribution, one can assume that x is real of complex gaussian. 
r"| As a related result, we show how laws for this ESD follow from 

«H laws for the singular value distribution of -^A„ — zl for complex 

z. 

As a corollary we establish the Circular Law conjecture (both 
almost surely and in probability), that asserts that /ij_^ con- 
verges to the uniform measure on the unit disk when the aij have 






a 



zero mean. 
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00 1- Introduction 

O 

.!_^ 1.1. Empirical spectral distributions. This paper is concerned with 

/\ the convergence of empirical spectral distributions of random matrices, 

j^ both in the sense of convergence in probability and in the almost sure 

sense. 

Definition 1.2 (Modes of convergence). For each n, let F„ be a random 
variable taking values in some Hausdorff topological space X, and let 
F be another element of X. 

• We say that Fn converges in probability to F if for every neigh- 
bourhood V of F, we have lim„_+oo P(-^n G V) = 1. 

• We say that Fn converges almost surely to F if we have P(lim„^oo -^n 
F) = l. 
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Similarly, if Xn is a scalar random variable, we say that X„ is bounded 
in probability if we have 

lim liminfP(|X„| < C) = I 

C^oo n—*oo 

and almost surely bounded if we have 

P (lim sup \Xn\ < cxd) = 1. 



Let M„(C) denote the set oinxn complex matrices. For A G M„(C), 
we let 

yU^(s, t) := —\{1 < i < n, ReAj < s, ImAj < t}| 
n 

be the empirical spectral distribution (ESD) of its eigenvalues Aj G 
C,i = 1, . . .n. This is a discrete probability measure on C 

Now suppose that An G M„(C) is a random matrix ensemble (i.e. a 
probability distribution on M„(C)), and let /Zoo be a probability mea- 
sure on C. We give the space of probability measures on C the usual 
vague topology, thus a sequence of deterministic measures /i„ converges 
to fi ii f^ f dfin converges to /^ / dfi for every test function (i.e. con- 
tinuous and compactly supported function) / : C — > M. Thus, by 
Definition 1.2, we see that /ij_^ converge in probability to /ioo if for 

every contmuous and compactly supported function / : C ^ M, the 
expression 



/ fiz) dfii^{z) - / f{z) d/ioo (1) 

Jc ^ Jc 

converges to zero in probability, thus 

lim P(| / f{z) dfi^^Jz) - [ fiz) dfi^l >s) = 

for every e > 0. Similarly, /i j_^ converges almost surely to /ioo if with 
probability 1, the expression (|T| converges to zero for all / : C — i> M. 

Remark 1.3. In practice, our matrices An will have bounded entries 
on the average, which suggests (by the Weyl comparision inequality, 



see Lemma A. 2) that their eigenvalues should be of size about 0{^/n 



thus the normalization by -^ is natural. 

1.4. Universality. A fundamental problem in the theory of random 
matrices is to determine the limiting distribution of the ESD of a ran- 
dom matrix ensemble (either in probability or in the almost sure sense) , 
as the size of the random matrix tends to infinity. 

The situation with this problem, so far, is that the analysis depends 
very much on which ensemble one is dealing with. In some cases such as 
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when the entries have gaussian distribution, powerful group-theoretic 
structure (e.g. invariance under the orthogonal group 0{n) or unitary 
group U{n)) plays an essential role, as one can use it to derive an ex- 
plicit formula for the joint distribution of the eigenvalues. The limiting 
distribution can then be computed directly from this formula. In the 
majority of cases, however, there is little symmetry, and such a formula 
is not available. Consequently, the problem becomes much harder and 
its analysis typically requires tools from various areas of mathematics. 

On the other hand, there is a well-known intuition behind this prob- 
lem (and many others concerning random matrices), the universality 
phenomenon, that asserts that the limiting distribution should not de- 
pend on the particular distribution of the entries. This phenomenon 
motivates many theorems and conjectures in the area. In the follow- 
ing, we mention two famous examples, Wigner's semi-circle law and 
the Circular Law conjecture. 

Wigner's semi circle law. In the 1950's, motivated by numerical ex- 
periments, Wigner [2B] proved that the ESD of an n x n hermitian 
matrix with (upper diagonal) entries being iid gaussian random vari- 
ables converge to the semi-circle law F whose density is given by 



p{x) 



Wigner's result (which holds for both modes of convergence) was later 
extended to many other ensembles. The most general form only re- 
quires the mean and variance of the entries [ini 12] : 

Theorem 1.5. Let An he the n x n hermitian random matrix whose 
upper diagonal entries are iid complex random variables with mean 
and variance 1. Then the ESD of -j^An converges (both in probability 

and in the almost sure sense) to the semi-circle distribution. 

Circular Law Conjecture. The well-known Circular Law conjecture 
deals with non-hermitian matrices. 

Conjecture 1.6. Let An be the n x n random matrix whose entries 
are iid complex random variables with mean and variance 1. Then 
the ESD of -T^An converges (both in probability and in the almost sure 
sense) to the uniform distribution on the unit disk. 

Similarly to Wigner's law, this conjecture was posed, based on numer- 
ical evidence, in the 1950's. The case when the entries have complex 
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gaussian distribution was verified by Mehta [H] in 1967, using Gini- 
bre's formula for the joint density function of the eigenvalues of An 
(see, for example, |2| Chapter 10]): 



p(Ai,...,A„) = c„]^|Aj - Aj|^exp(-n^|AiH. (2) 

i<j i=l 

Another case where such a formula is available is when the entries 
have real gaussian distribution, and for this case the conjecture was 
confirmed by Edelman |i6j. For the general case when there is no for- 
mula, the problem appears much harder. Important partial results 
were obtained by Girko [TJ |8], Bai [H [2], and more recently Gotze- 
Tikhomirov [9l[T0], Pan-Zhou [15] and the authors [26]. These results 
establish the conjecture (in almost sure or in probability forms) under 
additional assumptions on the distribution x. The strongest result in 
the previous literature is from [221 [10] in which the almost sure and 
in probability forms of the conjecture respectively were shown under 
the extra assumption that the entries have finite (2 -|- e)-th moment 
for any positive constant e. An attempt to remove this extra e (and 



thus proving Conjecture 1.6 in full generality) was a motivation for this 
paper. 

A demonstration of the circular law for the Bernoulli and the Gaussian 
case appears in Figure [T| 

In both the semi-circular law and the circular law, we observe that 
only the mean and variance of the entries play a role in the limit- 
ing distribution. This is a common situation, in fact, for many other 
conjectures in random matrix theory, such as Dyson's conjecture [HJ 
Chapter 1] , and this phenomenon sometimes referred to as universality 
in the literature. 

In this paper, we rigorously prove the universality phenomenon for the 
ESD of random matrices. More precisely, we show that the limiting 
distribution of the ESD of a random matrix ensemble An depends only 
the mean and variance of its entries, under a mild size condition on the 
mean EAn, and under the assumption that the matrix An — EAn has 
iid entries. 

For any matrix A, we define the Hilbert- Schmidt norm \\A\\2 by the 
formula \\A\\ := tTace{AA*y/'^ = trace ( AM) ^/^^ 

Theorem 1.7 (Universality principle). Let x andy he complex random 
variables with zero mean and unit variance. Let X„ = {xij)i<ij<n o-nd 
Yn := {yij)i<i,j<n bcnxn random matrices whose entries Xij, yij are iid 



UNIVERSALITY OF ESDS AND THE CIRCULAR LAW 

Bernoulli Gaussian 





Figure 1. Eigenvalue plots of two randomly generated 
5000 by 5000 matrices. On the left, each entry was an 
iid Bernoulli random variable, taking the values +1 and 
— 1 each with probability 1/2. On the right, each entry 
was an iid Gaussian normal random variable, with prob- 
ability density function is -7= exp(— x^/2). (These two 
distributions were shifted by adding the identity matrix, 
thus the circles are centered at (1,0) rather than at the 
origin.) 

copies of X and y, respectively. For each n, let Mn be a deterministic 
n X n matrix satisfying 



sup — 



\M„, 



< 00. 



(3) 



Let An := Mn+Xn and Bn := M„ + F„. Then fj,j_A —fij_B converges 

s/ri ^ yS " 

in probability to zero. If furthermore we make the additional hypothesis 
that the ESDs 

/^(^M„-./)(^M„-./)* (4) 

converge to a limit for almost every z, then ^ij_a — /^ j_ r converges 
almost surely to zero. 

Remark 1.8. The theorem still holds if we restrict the size of the ma- 
trices to an infinite subsequence ni < n2 < . . . of positive integers. 
This freedom to pass to a subsequence is useful for technical reasons 
involving compactness arguments. 



The condition ^ has the following useful consequence, which we shall 
use repeatedly: 
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Lem ma 1.9 (Tightness of ESDs). Let Mn and An be as in Theorem 



1.1. Then the quantities ^||^n||2 ^'^'^ /c l-^l^ '^/^^-a„(-2) o^'^e almost 



surely hounded (and hence also hounded in prohahility) . 



Proof. By the Weyl comparison inequahty (Lemma A. 2) it suffices to 
show that ^||A„||2 is almost surely bounded. By ([3j) and the triangle 
inequality it suffices to show that 7'2||-^n||2 is almost surely bounded. 
But this follows from the finite second moment of x and the strong law 
of large numbers. D 



As an immediate corollary of Theorem 1.7, we have 

Corollary 1.10 (Universality principle). Let x,y he complex random 
variables with zero mean and unit variance. Let X„ and Y^ he n x n 
random matrices whose entries are iid copies of x and y, respectively. 
For each n, let Mn he a deterministic n x n matrix satisfying (^. 
Let An := Mn + X„ and Bn '■= Mn + Yn. Then if fij_^ converges 
in prohahility to a limiting measure fi, then iij_a also converges in 

prohahility to fi. If furthermore we make the additional hypothesis that 
the ESDs ^ converge to a limit for almost every z, then we can replace 
"in prohahility" hy "almost surely" in the previous sentence. 

A demonstration of this corollary appears in Figure |2] 
Remark 1.11. One consequence of Corollary 1.10 (in the case when ^ 



converges to a limit) is that the ESD jj,j_a behaves asymptotically 

deterministicalljQin the sense that there exists a deterministic measure 
fin for each n such that fij_A ~ f^n converges almost surely to zero. 
Indeed, one can simply take fin to be an instance of fij_B , where the 

Bn are selected independently of the An, and the claim will hold almost 
surely. The question remains as to whether fin itself converges to some 



limit as n — > cx); we partially address this issue in Theorem 1.23 below. 



1.12. The Circular Law Conjecture. Thanks to Corollary |1.10[ we 
can reduce the problem of computing the limiting distribution to the 
case when the entries are gaussiaiin (or having any special distribution 
satisfying the variance bound). In particular, since the Circular Law is 
verified for random matrices with complex gaussian entries (see [H]), it 
follows that this law (both in probability and in the almost sure sense) 
holds in full generality. In other words, we have shown 



The authors thank Oded Schramm for this observation. 

The idea of estabhshing a hmiting law by first replacing a general random 
variable with a gaussian one is sometimes referred to as the "Lindberg trick" in the 
literature. 
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Bernoulli Gaussian 
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Figure 2. Eigenvalue plots of randomly generated n 
by n matrices of the form Dn + M„, where n = 
5000. In left column, each entry of M„ was an iid 
Bernoulli random variable, taking the values +1 and 
— 1 each with probability 1/2, and in the right col- 
umn, each entry was an iid Gaussian normal ran- 
dom variable, with probability density function is 
-4= exp(— a;^/2). In the first row, Dn is the de- 
terministic matrix diag(l, 1, . . . , 1, 2.5, 2.5, . . . , 2.5), and 
in the second row Dn is the deterministic matrix 
diag(l, !,...,!, 2.8, 2.8, . . . , 2.8) (in each case, the first 
n/2 diagonal entries are I's, and the remaining entries 
are 2.5 or 2.8 as specified). 



Theorem 1.13 (Circular Law). Let X„ be the n x n random matrix 
whose entries are iid complex random variables with mean and vari- 
ance 1. Then the ESD of -r^Xn converges (both in probability and in 
the almost sure sense) to the uniform distribution on the unit disk. 



Remark 1.14. In f26j (see also [TO] for an alternate proof for the in 
probability sense), this theorem was proven with the extra assumption 
that the entries have finite (2 -|- e)-i\i moment for any fixed e > 0; 
earlier related results are appear in [3 El [U O [9] . 



Notice that in Theorem 1.13, we set M„ to be the all zero matrix 



(for which the boundedness and convergence hypotheses are trivial). 
In [12], explicit distributions were computed for the case when M„ is 
an arbitrary diagonal matrix and Xn has iid gaussian entries. The 
formula for the limiting distribution is somewhat technical, but its 
support is easy to describe: it is exactly the set of 2; G C for which 



/ 



z — X 



-2 



d^{x) > 1 where /i is the limiting distribution of the ESD of 
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Mn- (In the case M„ is all zero, /x has all its mass at the origin, and so 
the set of z is the unit disk.) 



The proof of Theorem 1.7 actually shows that if M„ and M'^ both 
obey (|3| and have the property that the difference between the ESD 
^ and the counterpart for M'^ converges to zero for almost every 2;, 



Remark B.3). 



then Theorem 1.7 holds with A^ '■= Mn + Xn and -B„ := M' + 1^ (see 



This has the following interesting consequence. Assume that M„ is 
a matrix with low rank, say o{n). In this case, it is easy to see that 
the ESD (H) concentrates at Iz]"^, since the matrix involved here is a 
self-adjoint low rank perturbation of {zl"^!. Thus, we can replace M„ 
by the zero matrix and obtain 

Corollary 1.15. (Circular Law for shifted matrices) Let Xn be the 
n X n random matrix whose entries are iid complex random variables 
with mean and variance 1 and Mn be a deterministic matrix with rank 
o{n) and obeying ([s]). Let An := M„ + X„. Then the ESD of -j^An 
converges (in either sense) to the uniform distribution on the unit disk. 



In particular, it shows that Theorem 1.13 still holds if the entries have 
(the same) non-zero mean. This extends a result of Chafa'i ^, which 
in addition assumed that the entries had finite fourth moment. 



1.16. Extensions. We can extend Theorem 1.7 in several ways. First, 
by conditioning, we can obtain a theorem for M„ being a random ma- 
trix. 

Theorem 1.17 (Universality from a random base matrix). Let x and 
y be complex random variables with zero mean and unit variance. Let 
Xn = {xij)i<i,j<n o^nd Yn = {yij)i<i,j<n bcTixn random matrices whose 
entries are iid copies of x and y, respectively. For each n, let Mn be 
a random n x n matrix, independent of Xn or Yn, such that -^||M, "^ 



2 



is bounded in probability (see Definition 1.2). Let An := M„ + X, 



and Bn := M„ + Yn. Then fi'j_A„ " /^^-b„ converges in probability to 

zero. If we furthermore assume that :^\\Mn\\l is almost surely bounded, 
and ^ converges almost surely to some limit for almost every z, then 
fJ^j^A„ ~ f^^-B„ converges almost surely to zero. 



We can also address a more general form of random matrices (cf. 
[8]). Let Kn,Ln be two sequences of matrices. Define An := Mn + 
KnXnLn and Bn := Mn + KnYnLn- Wc can show that under some 



mild assumptions on Mn, Kn, Ln, Theorem 1.7 still holds: 



UNIVERSALITY OF ESDS AND THE CIRCULAR LAW 

Bernoulli Gaussian 





Figure 3. Eigenvalue plots of two randomly generated 
5000 by 5000 matrices of the form A + BMnB, where A 
andi? are diagonal matrices having n/2 entries with the 
value 1 followed by n/2 entries with the value 5 (for D) 
and the value 2 (for X). On the left, each entry of M„ 
was an iid Bernoulli random variable, taking the values 
+1 and —1 each with probability 1/2. On the right, each 
entry of M„ was an iid Gaussian normal random variable, 
with probability density function is ^= exp(— a;^/2). 



Theorem 1.18. Let x and y be complex random variables with zero 
mean and unit variance. Let X„ and F„ benxn random matrices whose 
entries are iid copies ofx and y, respectively. Let Mn, Kn, Ln be random 
n X n matrices (independent of Xn,Yn) and let An '■= Mn + KnXnLn 
and Bn '■= Mn + KnYnLn. Assume that the expressions 



\A IP 



1 



R l|2 

2ll-°nll2 



lliv—iA/T r-i||2 I II ;<^-i r-i||2 
— ||A„ iW„L„ ||2 + -||A„ L„ 11, 



(5) 
^^ .. .._ ^^ .. .._ ^^ .. .. ,. .._ ^.. .. .. .._ 

are bounded in probability. If furthermore we assume that (|5| is almost 
surely bounded, and that for almost every z the ESDs 

converge almost surely to a limit, then fi^_A ~f^^-B converges almost 
surely to zero. 



Note that Theorem 1.17 is the special case of Theorem 1.18 in which 
Kn = Ln = I. It seems of interest to see whether the hypotheses on 
(|5| can be verified for various natural random or deterministic matrices 
Mn, Kn, Ln, normalised appropriately by a suitable power of n. We do 
not pursue this matter here. 

A demonstration of the above theorem for the Bernoulli and the 
Gaussian case appears in Figure |3| 



The proofs of these extensions are discussed in Section [7| 
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Another direction for generalization is to consider random matrices 
whose entries are independent, but not necessarily identically distributed. 
Most of the tools used in this paper (e.g. law of large numbers, Tala- 
grand's inequality, and the least singular value bound from [2S]) extend 
without difficulty to this setting. Furthermore, Krishnapur pointed out 
that one can also prove a "universal" version of Theorem B.l[ This 



leads to a generalization in Appendix O (written by Krishnapur) 

For similar reasons, one expects to be able to extend the above re- 
sults to the case when Xn and F„ are sparse iid random matrices; for 
instance, the least singular value bounds from [26] extend to this case, 
and the circular law for sparse iid matrices is already known in several 
cases j9], [26]. We, however, will not pursue these matters here. 



1.19. Computing the ESD of a random non-hermitian matrix 
via the ESD of a hermitian one. Theorem |1.7| provides one use- 
ful way to compute the (limiting distribution of) ESD of a random 
non-hermitian matrix, namely that one can restrict to any particular 
distribution (such as complex gaussian) of the entries. The proof of 
this theorem (with some modification) also provides another way to 
deal with this problem, namely that one can reduce the problem of 
computing the ESD of -j^An to that of {4^An - zI)(4^An - zl)*, for 
fixed 2; e C More precisely, we have the following equivalences. 

Theorem 1.20 (Equivalences for convergence). Let An be as in The- 
orem |i.?[ and let fi be a probability measure on C with the second 
moment condition J |2;p dfi{z) < 00. Then the following are equiva- 
lent: 

(i) The ESD /i j_^^^ of -j^A^, converges in probability to fi. 
(ii) For almost every complex number z , Mog | det(4^yl„ — z/)| con- 
verges in probability to /j^log \w — z\ dfi{w). 
(iii) For almost every complex number z, there exists a sequence Sn > 
of positive numbers converging to zero such that ^ log det(((4^A^ 



zI)-\-enI){^An—zI)*-\-enI) convcrgcs in probability to 2 f^log 
z\ dfi{w). 



w- 



If furthermore the ESDs ^ converge to a limit for almost every z, then 
we can replace convergence in probability by almost sure convegence in 
the above equivalences. 

We prove this result in Section [8j As a corollary, we have a criterion 
for when -j^An converges to a distribution /i: 
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Corollary 1.21. Let An be as in Theorem \1.1\ and let fi be a probability 
measure on C with the second moment condition J |2;p dfi{z) < oo. 
Suppose that for almost every complex number z, the ESD of {^An — 
zI){-j^An — zl)* converges in probability to a limiting distribution rjz 
on [0, +oo) such that the integral J^^logt drjzit) is absolutely convergent 
and equal to 2 J^, log \w — z\ dfi{w). Then the ESD of -j^An converges 
in probability to fi. If the ESDs ^ converge to a limit for almost 
every z, then we can replace convergence in probability by almost sure 
convergence in the above implication. 

Proof. We verify the claim for almost sure convergence only; the proof 
for convergence in probability is similar and is left as an exercise to the 
reader. 



By Lemma 1.9, we see that for fixed z, |^ trace(4^74„ — z/)(4^A„, — 
zl)*\ is also almost surely bounded. Taking limits, we conclude that 



t drizit) < cxD. 
c 



We then see from the dominated convergence theorem that for any 
£ > 0, ^\og (iei{{{^ An -z I) +e I) {^ An - z ly + el) converges almost 

surely to Ljog(t + e) drjzit). From this we obtain hypothesis (iii) of 



Theorem 1.20 (if £„ is chosen to decay to zero sufficiently slowly), and 



the claim follows. D 

Since the eigenvalues of {^An — zI){^An — zl)* are the squares 
of the singular values of -j^An — zl., we can also say that Theorem 



1.20 reduces the problem of computing the limiting distribution of the 
eigenvalues of -^An to that of the singular values of -^An — zl. 

The big gain here is that the matrix {-j^An — zI){^An — zl)* is 
hermitian. (Random matrices of this type are often called sample co- 
variance matrices in the literature.) This allows one to use standard 
tools such as truncation, Wigner's moment method and Stieljes trans- 



form (see, for instance, the proof of Theorem 1.5 in [2^, Chapter 2]), or 



results such as Theorem B.l techniques from free probability are also 
very powerful for such problems. These methods cannot be applied to 
non-hermitian matrices for various reasons (see [2|, Chapter 10] for a 
discussion) and their failure has been the main difficulty in attacking 
problems such as the Circular Law conjecture. 



One can use Corollary 1.21 to give another proof of Theorem 1.13 



without relying on explicit formulas such as ([2]). We omit the details. 
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1.22. Existence of the limit. The resuhs in the previous chapters 
provide two different ways to compute (explicitly) the limiting measure 
of the ESD of random matrices. In fact there is a simple compactness 
argument that guarantees the existence of the limit, assuming of course 
that the deterministic ESDs ^ already converge, although the argu- 
ment does not provide too much information on what the limit actually 
is. More precisely, we have 

Theorem 1.23. Let x be a complex random variable with zero mean 
and unit variance. Let X„ be the n x n random matrix whose entries 
are iid copies of x. For each n, let Mn be a deterministic n x n matrix 
satisfying 

sup^||M„||2 < oo. (7) 

Assume furthermore that the ESD Q converges for almost every z G 
C. Then the ESD of -j^-^n, where A^ '■= Mn + Xn, converges (in both 
senses) to a limiting measure fi. 



Proof. We let /i,/2,/3,--- be an enumeration of a sequence of test 
functions which is dense in the uniform topology (such a sequence ex- 
ists thanks to the Stone- Weierstrass theorem and the compact sup- 
port of test functions). By applying the Bolzano- Weierstrass theorem 
once for each function in this sequence and then using the Arzela- 
Ascoli diagonalization argument, we can refine the subsequence so that 
J^fjiz) dfij_^^{z) converges in probability to some limit for each j, 

and hence by a limiting argument J^, g{z) dfij_j^^ (z) converges in prob- 
ability to a limit for each test function g. By the Riesz representation 
function we conclude that along this subsequence, /^j_a„ converges in 
probability to some limit /i, which is also a probability measure by the 



tightness bounds in Lemma 1.9 



Applying Theorem 1.20 we conclude that for almost every z, the 
expression 

- log det(((^A„ - zl) + sMi^An - ziy + ej)) (8) 
n Jn Jn 



converges in probability to 2 ^^ log |t(; — z| dfi{w) along this sequence, 
for some En converging to zero. On the other hand, from the hypotheses 



and the theorem of Dozier and Silverstein (see Theorem B.l ) we know 
that for almost every z, the expression (|8| has a almost sure limit for 
the entire sequence of n. Combining the two facts we see that for almost 
every 2, (|8| in fact converges almost surely to 2 J^log\w — z\ djj,{w) 
for all n. The claim now follows from another application of Theorem 

Onl D 
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1.24. Notation. The asymptotic notation is used under the assump- 
tion that n ^ oo, holding all other parameters fixed. Thus for instance, 
if we say that a quantity a^^n depending on n and another parameter 
z is equal to o(l), this means that az^n converges to zero as n — > cxd 
for fixed z, but this convergence need not be uniform in z. As another 
example, the condition ([s]) is equivalent to asserting that \\Mn\\ = 0{n) 
as n ^ oo. 



2. The replacement principle 



The first step toward Theorem 1.7 is the following result that gives 



a general criterion for two random matrix ensembles -j^An, -y^Bn to 
converge to the same limit. 

Theorem 2.1 (Replacement principle). Suppose for each n that An, Bn G 
M„(C) are ensembles of random matrices. Assume that 



,uAn\\l + -jBJl (9) 



(i) The expression 

1 „ . 1,2 1 

n^ n 

is bounded in probability (resp. almost surely). 
(ii) For almost all complex numbers z, 

- log I det(^/l„ -zl)\-- log I det(^5„ - zl)\ 
n \/n n \/n 



converges in probability (resp. almost surely) to zero. In par- 
ticular, for each fixed z, these determinants are non-zero with 
probability 1 — o(l) for all n (resp. almost surely non-zero for 
all but finitely many n). 

Then /i j_^ — ^ij_b converges in probability (resp. almost surely) to 
zero. 



We would like to remark here that we do not need to require inde- 
pendence among the entries of An and Bn- The proof of this theorem 
is rather "soft" in nature, relying primarily on the Stieltjes transform 
technique (following Girko [7]) that analyses the ESD ^^J_A„ i^ terms of 

the log-determinants ^ log | det(-^y4„ — zl)\, combined with tools from 
classical real analysis such as the dominated convergence theorem (see 



Lemma 3.1 for the precise version of this theorem that we need). The 



details are given in Section |3j 



In view of Lemma 1.9, we see that Theorem 1.7 follows immediately 



from Theorem 2.1 and the following proposition. 
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Proposition 2.2 (Converging determinant). Let x and y be complex 
random variables with zero mean and unit variance. Let X„ and Yn be 
n X n random matrices whose entries are iid copies of x and y, respec- 
tively. For each n, let Mn be a deterministic n x n matrix satisfying 
(|3|. Set An := M„ + X„ and -B„ := M„ + 1^. Then for every fixed 

zeC, 

- log I det(^A„ -zl)\-- log I deti^Bn - zl)\ (10) 



converges in probability to zero. If furthermore we assume that Q 
converges to a limit for this value of z, then (10) converges almost 
surely to zero. 

For any square matrix A of size n, let Xi{A) and Si{A) be the eigen- 
values and singular values of A. Furthermore, let di{A) be the distance 
from the ith row vector of A to the subspace formed by the first i — 1 
row vectors. From linear algebra, we have the fundamental identity 



\detA\ =l[\K{A)\ = l[s,{A) = l[d,{A). (11) 

i=l i=l i=l 

We will need to study the singular values and distances of -k^An — zl 
and -n=Bn — zl in order to estimate their determinants. The proof of 



Proposition 2.2 which occupies Sections HI p^ and pi is the heart of the 



paper. This proof relies on the following three ingredients: 

• A result by Dozier and Silverstein j3] that compares the ESD of 
the singular values of the matrices A^An — zl and -r^Bn — zL 
This will let us handle all the rows from 1 to (1 — 6)n for some 
small 6 > 0. 

• A lower tail estimate for the distance between a random vector 
and a fixed subspace of relatively large co-dimension, using a 
concentration inequality of Talagrand [13] . This will handle the 
contribution of the rows between (1 — S)n and (say) n — n°'^^. 

• A polynomial lower bound for the least singular value of -j^An — 

zl and -T^Bn—zI from [2S1I2Z]- This bound enables us to handle 
the contribution of the last n°'^^ rows. 



3. The replacement principle 



The purpose of this section is to establish Theorem 2J^ We begin 
with a version of the dominated convergence theorem. 
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Lemma 3.1 (Dominated convergence). Let (X,!/) be a finite measure 
space. For each integer n > 1, let fn '■ X ^ M. he a random functions 
which are jointly measurable with respect to X and the underlying prob- 
ability space. Assume that 

(i) (Uniform integrability) There exists 5 > such that f^ \fn{x)\^~^^ du 

is bounded in probability (resp. almost surely). 
(ii) (Pointwise convergence in probability) For v-almost every x G 
X, fn{x) converges in probability (resp. almost surely) to zero. 

Then Jx fnix) du{x) converges in probability (resp. almost surely) to 
zero. 



Proof. We first prove the claim for convergence in probability. We can 
normalise z/ to be a probability measure. Let e > be arbitrary. It 
suffices to show that 



fn{x) dv{x) = 0(e) 



X 



with probability 1 — 0{e) — o(l). 

By hypothesis (i), we already know that with probability 1 — 0(e) 
o(l), that 

/ \fnix)\'-^' duix) < C, 

Jx 
for some C^ depending on e. This implies that 

/„(x)I(|/„(a;)| > M) du{x) < CJM' 



X 



for any M > 0, where I(-E) denotes the indicator of an event E. In 
particular, for M large enough we have 



fn{x)l{\fn{x)\>M)du{x)<e, 
IX 

with probability 1 — 0{e) — o(l), and so it will suffice to show that 

/ /„(x)I(|/„(a;)| < M) du{x) = 0{e) (12) 

Jx 

with probability 1 — o(l). 

Fix M. By hypothesis, we have lim„_>oo P(|/n(a;)| > e) = for z/- 
almost every x G X. By the dominated convergence theorem, we 
conclude that 



[ Pi\fnix)\>e)duix) = oil). 
Jx 
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By Fubini's theorem, we conclude that 

E [ I{\Ux)\>6)du{x) = o{l) 
Jx 

and so by Markov's inequahty, we have 

I{\fn{x)\>e)du{x)=0{e/M) 



X 



with probabihty 1 — o(l). The claim (12) easily follows 



Now we prove the claim for almost sure convergence. Again we let 
1/ be a probability measure and e > be arbitrary. With probability 
1 — 0{6) we have 

Jx 
for all sufficiently large n, and some Ce depending on n. Also, with 
probability 1, fn{x) converges to zero for almost every x. The claim now 
follows by invoking (the deterministic special case of) the convergence 
in probability version of the lemma that we have just proven. D 



Now we begin the proof of Theorem 2A_ We thus assume that An, Bn 
are as in that theorem. We shall first prove the claim for convergence 
in probability, and indicate later how to modify the proof to obtain the 
principle for almost sure convergence. 

From the boundedness in probability of ([9]) and Weyl's comparison 



inequality (Lemma A. 2) we see that for every e > there exists C^ > 



such that for each n, the eigenvalues Ai, . . . , A„ of A„ obey the bound 

j:^.\>^^'<Cs (13) 

i=i 
or equivalently that 

Jc ^ 

with probability 1 — 0{e) — o(l). Similarly we have 

In particular, for each n we see that with probability 1 — 0{e) — o(l) 
we have the tightness bounds 

^^.J,^{zeC■.\z\>R}<Ce/R^ (14) 

and 

H^B {zeC:\z\>R}< Ce/R^ (15) 

for all i? > 0. 
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We now take the standard step of passing from the ESDs /U j_^ , /i^_5 
to the characteristic functions Tnj_A ,mj_n : M^ ^ C, which are 
defined by the formulae 



thus the functions m i a , m i r are continuous and are bounded uni- 
formly in magnitude by 1. 



Thanks to the tightness bounds ( 14 )-( 15 ), we can easily pass back and 



forth between convergence of ESDs and convergence of characteristic 
functions: 

Lemma 3.2. Let the notation and assumptions be as above. Then the 
following are equivalent: 

(i) f^J-A "~ f^^-B converges in probability. 

(ii) For almost every u,v, rnj_^ {u,v) — mj_^ {u,v) converges in 
probability. 

Proof. We first show that (i) implies (ii). Fix u,v, and let e > be 



arbitrary. From (14), (15) we can find an R depending on C^ and e 
such that 

/i 1 ^J{2; G C : |2;| > R}) + fiiB^i^ e C : \z\ > R}) < e 

with probability 1 — 0(£:) — o(l). In particular, with probability 1 — 
0(e) — o(l) we have 

m^B(u,v)-m..{u,v) = U{z / R)e'-''<^^^^^'^^^^ [d^,.J,{z)-d^,.Au,v){z)]+0{e) 



where ip is any smooth compactly supported function that equals one 
on the unit ball. But since /ij_ d — /i^_ /i converges in probability, 

the integral here converges to zero in probability. The claim follows. 

Now we prove that (ii) implies (i). Since continuous compactly sup- 
ported functions are the uniform limit of smooth compactly supported 
functions, it suffices to show that j^ f dfij_^^ ~ Ic ^ ^I^^-b„ converges 
in probability to zero for every smooth compactly supported function 

/ : C -> C. 
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Now fix a smooth compactly supported function f : C ^ C By 
Fourier analysis, we can write 



(16) 
for some smooth, rapidly decreasing function /. In particular, the mea- 
sure du = f{u, v) dudv is finite. The claim now follows from dominated 
convergence (Lemma 



bounded and so clear 



3.1); note that the function m \ a — m i r is 

__J V5J " /S-"" 

y obeys the moment condition required in that 



lemma. D 



In view of the above lemma, it suffices to show that m \ a (u,v) — 
'mj_Q^{u, v) converges in probability to zero for almost every m, f G M. 

Fix u, V. Since we can exclude a set of measure zero, we can assume 
that u, V are non-zero. We allow all implied constants in the arguments 
below to depend on u, v. 

Following Girko [7j, we now proceed via the Stieltjes-like transform 
g^_A : C — > M, defined almost everywhere by the formula 

g^^^^{z):=2Rej^j^-^^df,^H 

2 " z-^X, (17) 



observe that this is a locally integrable function on C, and that 

for all but finitely many z. 

We have the following fundamental identity: 
Lemma 3.3 (Girko's identity). [7] For every non-zero u,v we have 

m^AAu,v) = —^— / (/ ^ 1 ^„(. + ^t)e-^+»* dt)ds, 

where the inner integral is absolutely integrable for almost every s, and 
the outer integral is absolutely convergent. 

Proof. We argue as in ^ Lemma 3.1]. Since 

mj_A iu,v] = — y e "-v^ ^' ^v^ ^" 
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it suffices from (17) to show that 



i{uRe{w)+vlraiw)) _ U + V / , / Re{S + it w) jus+ivt ^y-W^ 



2niu 7k 7m \s + it — wl"^ 

for each complex number w, with an absolutely convergent inner inte- 
gral and outer integral. But standard contour integration shows that 

[ Re(^ + ^^-^) gm.+».t ^^ = 7rsgn(s - Re(w;))e-"l^'^^("')le'"^e'"^^('") 

7k \s + tt-w\^ 

(19) 
for every s ^ Re(i(;), and the claim follows by an elementary integra- 
tion. D 



We can of course define g^^ similarly, with analogous identities. To 



conclude the proof of Theorem 2J^ , it thus suffices to show that for any 
£ > and any n, we have 

/( /(^ 1 ^„(3 + ^t) -^ 1 ^Js + zt))e»^+-* dt)ds = 0{e) (20) 

7r 7iR ^ ^ 

with probability 1 — 0(e) — o(l). 



Fix 5 > 0. By (14), (15), we can find an i? > 1 large enough that with 



probability 1 — 0(e) 

fi^AAU eC:\z\> R}) + fiisJU eC:\z\> R}) < e. (21) 
We now condition on the event that (Pll) holds. 



We now smoothly localize the z variable to a compact set as follows. 
Let '?/' : M — >■ M"*" be a smooth cutoff function which equals 1 on [—1, 1] 
and is supported on [—2, 2]. 

Lemma 3.4 (Truncation in s,t). Let tt; G C 
(i) The integral 

\w — [s + zt)p 

is of size 0(1), and (if R is large enough) is of size 0(e) when 
\w\ < R. 
(ii) The integral 



Re(w-(g + zt)) ^.^,^.^, 
\w — {s + zt)p 



(1 - i){t/R^)) dt\i){s/R^) ds (22) 



is of size 0(1), and (if R is large enough) is of size 0(e) when 
\w\ < R. 
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Proof. The claim (i) follows easily from (19), so we turn to (ii). We 
first verify the claim that (22) is bounded. Replacing everything by 
absolute values one sees that 



Re{w- (s + it)) 
\w — {s + zt)P 



ius+ivt 



{l-i){t/R^)) dt\ =0(1) 



(in fact one can obtain an explicit upper bound of tt), so we can dis- 
pose of the region of integration in which s = R.e{w) + 0(1). For the 
remaining values of s, we use repeated integration by parts, integrating 
the e™* term and differentiating the others. After two such integrations 
we obtain the bound 



Re(w 



zt)) 



ius+ivt 



\w — {s + it)|2 
The claim then follows. 



{l-^{t/R')) dt\ = 0{{R-'+\s-Re{w)\-y) 



Finally, if |u7| < i?, then one easily verifies (by repeated integration 
by parts) that 



Re(M; - (s + it)) 



ius+ivt/ 



\w — {s + zt)p 
(say), and so the final claim of (ii) follows. 



i^{t/R^)) dt = 0{1/R^ 



n 



From this lemma and (17), the triangle inequality and (21) we con- 
clude that 



g^AAs + it)e 



ius+ivt 



dt){l-i){s/R^))ds = 0{e). 



(23) 



and 



( / Qj^aS^ + ^t)e^"^+^^*(l - i^{t/R')) dt)^{s/R')ds = 0{e). (24) 



From (23), (24) (and their counterparts for gj_B^) and the triangle 



inequality, we thus see that to prove ( 20 ) , it suffices to show that 

{g.^^is + tt)-g.B^is + tt))e'-^^''"i;it/R')i;{s/R') dtds (25) 




converges in probability to zero for every fixed R > 1. Note that the 
integrands here are now jointly absolutely integrable in t, s, and so we 
may now freely interchange the order of integration. 



Fix R. Using (18) and integration by parts in the s variable, we can 



rewrite (25) in the form 




fn{s,t)(t)u,v,R{^,t) dsdt 
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where 

fnis,t) := -log|det(^A„-z/)|--log|det(^5„-z/)| 



and 

(Note that there are finitely many values of t for which the integration 
by parts is not justified due to singularities in g^j^ or g^^ , but 

these values of t clearly give a zero contribution at the end of the day.) 
Thus it will suffice to show that 




\fn{s,t)\\(^u,v,R{s,t)\ dsdt 

converges in probability to zero. 



From (11) we have 



-log|det(^A„-z/)| = - Vlog|^Aj-(s + it)| (26) 



n \/n n '■ — ' \^n 

and similarly for Bn- From the boundedness and compact support of 
<Pu,v,R we observe that 

log I^A - (s + it)\^\(l)u,vA-^,t)\ dsdt < 0^_ ^(1 + -iaH 




for all A G C; from this, (26), (13), and the triangle inequality we see 
that 

\fn{s,t)\^\(j)u,vAs,t)\ dsdt (27) 

is bounded uniformly in n. Since by hypothesis fn{s,t) converges in 
probability to zero for almost every s,t, the claim now follows from 
dominated convergence (Lemma 3.1). The proof of Theorem 2.1 is 




now complete in the case of convergence in probability. 

3.5. The almost sure convergence case. We now indicate how to 
adapt the above arguments to the case of almost sure convergence. 
Firstly, since (|9| is now almost surely bounded instead of just bounded 
in probability, we can now say that for every e > there exists C^ > 



such that with probability 1 — 0(e), (14), (15) holds for all sufficiently 
large n (as opposed to these bounds holding with probability l — 0{e) — 
o(l) for each n separately). 



continues 



Next, we observe the (well-known) fact that Lemma 3.2 
to hold when convergence in probability is replaced by almost sure 
convergence throughout. Indeed the implication of (ii) from (i) is nearly 
identical and is left as an exercise to the reader. To deduce (i) from 
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(ii) in the almost sure case, observe from the separabihty of the space 
of smooth compactly supported functions in the uniform topology that 



it suffices to show that (16) converges almost surely to zero for each 
/. On the other hand, from (ii) and Fubini's theorem we know that 



with probability 1, that m_ 



{u,v) 



converges to zero for 



almost every u, v, and the claim follows from the (ordinary) dominated 
convergence theorem. 



Once again we use Girko's identity. Lemma 3.3[ and reduce to showing 
that for every e > 0, one has with probability 1 — 0{e) that (20) holds 
for all but finitely many n. From our bounds on ( [l4) ), (15) we see that 
with probability 1 — 0{e), that (21) holds for all but finitely many n. 
We apply Lemma 3.4 (which is deterministic) and reduce to showing 
that (25) converges almost surely to zero for each fixed R > 1. The 



rest of the argument proceeds as in the convergence in probabihty case. 



3.6. An alternate argument. There is an alternate derivatiorjj of 
Theorem 2.1 that avoids Fourier analysis, and is instead based on the 
observation that for any complex polynomial P{z), the distributional 
Laplacian A log |-P(-z)| of the logarithm of the magnitude of P is equal 
to the counting measure of the zeroes of P (counting multiplicity). In 
particular, we see from Green's theorem that 



/rf(/ii^„-/^iBj 



27m 



(A/(^)) log I det(— A„-^/)|— log I det(— 5„-^/)| 



n 



n 



n 



for any smooth, compactly supported /. Applying Lemma 3.1| we can 
then get convergence of this integral (either in probability or in the 
almost sure sense, as appropriate); the uniform integrability required 



can be established by repeating the computations used to bound (27) 



One can then easily take limits to replace smooth compactly supported 
/ to continuous compactly supported /; we omit the details. 



4. Proof of Proposition 12.21 



In this section we present the proof of Proposition 2.2 , modulo several 
key lemmas. Let a;,|/,M„, y4„, _B„, z be as in that proposition. By 
shifting M„ by -Jnzl if necessary we can assume 2; = 0. Our task is 
now to show that 

- log I det(^A„)| - - log I det(^5„)| 
n Jn n Jn 



converges in probability to zero, and also almost surely to zero if 
i^^M^Mi converges. 



We thank Manjunath Krishnapur for this simpler argument. 
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Let us first remark that the almost sure convergence claim implies the 
convergence in probability claim. Indeed, suppose that convergence in 
probability failed, then there would exist an e > such that 



- log I det(^A„)| - - log I det{^B„ 
n \/n n \ n 



>e\>e (28) 

for a subsequence of n. By vague sequential compactness one can pass 
to a further subsequence along which /x i^j ^/* converges, and hence by 
hypothesis one has almost sure (and hence in probability) convergence 



to zero along this sequence, contradicting (28). Thus it suffices to 
establish almost sure convergence assuming the convergence of /xi^^ ^i*. 

ji f^ -n. 

Let Zi, . . . , Znhe the rows of M„. By assumption ^ we have 

n 

J2\\Z4' = 0{n' 






In particular, at least half of the Zi have norm 0{^/n). By permuting 
the rows of M„, An, Bn if necessary, we may assume that it the last half 
of the rows have this property, thus 



||Zi|| =0(V^) for alln/2 <i <n. (29) 

Let <Ji{A) > . . . > (Tn{^ > denote the singular values of a matrix 
A. We have the following fundamental lower bound: 

Lemma 4.1 (Least singular value bound). With probability 1, we have 

a„(AO,cr„(B„)>n-°(i) (30) 

for all but finitely many n. In particular, with probability 1, An and 
Bn are invertible for all but finitely many n. 

Proof. This follows immediately from [261 Theorem 2.1] or [271 Theo- 
rem 4.1] and the Borel-Cantelli lemma, noting from (|3| of Proposition 



2.2 



that the operator norm of M„ is of polynomial size nP'^^\ There 
are previous results in [T7], [21], [18], [25], which handled special cases 
with more assumptions on M„ and the underlying distributions x, y 
(for instance, in some of the prior results M„ was assumed to vanish, 
or x, y were assumed to be integer- valued or to have finite higher mo- 
ments). One can obtain explicit bounds on the tail probability and on 
the exponent 0(1); see [2Z]. However, for our applications the above 
bounds will suffice. D 



We also have with probability 1 the crude upper bound 

ai(A„),ai(B„)<n°« (31) 
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for all but finitely many n, which follows easily from the polynomial 
size of M„ the bounded second moment of x, y, and the Borel-Cantelli 
lemma. Again, much sharper bounds are available, especially if x and 
y have finite fourth moment, but we will not need these bounds here. 

Let Xi, . . . , Xn be the rows of An, and for each 1 < i < n let Vi he 
the i — 1-dimensional space generated by Xi, . . . ,Xj_i. From (11) we 
have 



-log|det(^A„)| = - Vlogdist(^Xi,l^,) 

and similarly 

-log|det(^5„)| = 1 y]logdist(^F„iy,) 
n Jn n ^ — ^ Jn 

where Y\, . . . ,Yn are the rows of 4^i?„, and Wi is spanned by Fi, . . . , Yi-.\. 
Our task is then to show that 

- V log dist(^x,, i^,) - log dist(^ri, M^O 

n ^-^ Jn - iri 

converges almost surely to zero. 



From (30), (31) and Lemma A. 4 we almost surely obtain the bound 

logdist(^X„ V,),logdist(^Fi,W^i) = O(logn) 
'n Jn 



for all but finitely many n. Thus it suffices to show that 

- V logdist(^X,, V,) - logdist(^F„ W,) 

(say) converges almost surely to zero. This follows immediately from 
the following two lemmas. 

Lemma 4.2 (High-dimensional contribution). For every e > there 
exists < 6 < 1/2 such that with probability 1, one has 

- V |logdist(^X„\/,)| = 0(£) 

n ^ — ^ Jn 

(l-(5)n<J<n-nO-9S ^ 

for all but finitely many n. Similarly with dist(-7^Xj, V^) replaced by 
dist{ j^Y„W,). 

Lemma 4.3 (Low-dimensional contribution). For every e > there 
exists < 6 < 1/2, such that with probability 1 — 0{e), one has 

- y logdist(^Xi, V^ - logdist(^F„ W.^ = 0(e) 

l<i<{l-S)n ^ 

for all but finitely many n. 
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The next two sections will be devoted to the proofs of these two lem- 
mas. 



5. Proof of Lemma [4T2l 



We now prove Lemma 42 We can of course take n to be large de- 
pending on all fixed parameters. Let < 6 < 1/2 be a small number 
depending on e to be chosen later. 

Clearly it suffices to prove this lemma for dist(4^Xj, V^). We first 
prove the (much easier) bound for the positive component of the loga- 
rithm. By the Borel-Cantelli lemma it suffices to show that 

°° I 1 

Vp(- V max(logdist(^Xi,\/i),0) > e) < oo. 



n ^ — ' \/n 

n=l (l-<5)n<j<n-n0.99 

To establish this, we use the crude bound 



max(logdist(^^Xj, Vi),0) < maxflog ^^||XJ,0) 
'n \/n 



and thus 



1 1 °° 1 

y max(logdist(^X„ V^,),0) < 0(V - V I(||X,|| > 2™v^)). 

„0.99 

(32) 



(l~S)n<i<n-nO-99 ^ m=0 (l-5)n<i<n-n' 



Thus if the left-hand side of (32) exceeds e, we must have 



- V I(||XJ >2'"v^) >£/(100 + m)^ 



n 

(l-<5)ra<i<n-nO-99 



(say) for some m > 0. On the other hand, from (29) and the second 
moment method we see that P(||Xj|| > T^^fn) = 0(2"^"^), and thus 
by Hoeffding's inequality we have 

P(- y I(||XJ > 2^7^) > e/(100+m?) < Cexp(-cn-°-°^-cm 

(say) for some constants C, c > depending on £, if 5 is chosen suffi- 
ciently small depending on e. The claim follows. 

It remains to establish the bound for the negative component of the 
logarithm. By the Borel-Cantelli lemma it suffices to show that 

°° 1 1 

Vp(- V max(-logdist(^Xi,1/i),0) >e)<oo. 

^-^ n ^-^ Jn 

This will follow from the union bound and the following estimate. 
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Proposition 5.1 (Lower tail bound). Let 1 < d < n — n^-^^ and 
< c < 1, and let W be a (deterministic) d-dimensional subspace of 
C". Let X be a row of An (the exact choice of row is not important). 
Then 



P(dist(X,iy) < cVn - d) = 0(exp(-n°-°^)). 
(The implied constant of course depends on c.) 

Indeed, since Xi and Vi are independent of each other, the proposition 
imphes that 

1 ,, ,,. 1 



y/n 2y/n 



(say) for each (1 — S)n < i < n — n^-^^, with probabihty 1 — 0{n~^^) 
(say). Setting S sufficiently small (compared to e), taking logarithms 
and summing in i and n one obtains the claim. 

It remains to prove the proposition. Similar lower bounds concerning 
the distance of a random vector to a fixed subspace have appeared in 
|22j . |18] . |19j . Here, however, we have the complication that the coef- 
ficients of X have non-zero mean and have no higher moment bounds 
than the second moment; in particular, they can be unbounded. 

We first eliminate the problem that X has non-zero mean. Write 
X = V + X' , where v := E(X) is a deterministic vector (which could 
be quite large) and X' has mean zero. Then we have dist(X, PF) > 



dist(X', span(iy, f )). Thus Proposition 5.1 follows from the mean zero 
case (after making the harmless change of incrementing d to d+1, and 
adjusting the parameters slightly to suit this). 

Henceforth we assume that X has mean zero, thus X = [xi, . . . , Xn) 
for some iid copies xi, . . . , x„ of x. Now we deal with the problem that 
the Xi, . . . ,Xn can be unbounded. By Chebyshev's inequality, we have 
P(|xj| > ra°'^) = 0(n~°'^) for all 1 < i < n. The event \xi\ > nP'^ 
are jointly independent in i. By Chernoff inequality (see, for instance, 
[221 Chapter 1]), we can show that with probability 1 — 0(exp(— n°'°^)), 
that there are at most n°'^ indices i for which \xi\ > n^-^. (One can also 
verify this directly using binomial coefficients and Sterling's formula.) 

By conditioning on the various possible sets of indices for which \xi\ > 
n°'^, we see that it suffices to show that 

P(dist(X,l^) < cVn - d\Ei) = 0(exp(-n°-°^)) 

for each I C {1, . . . ,n} of cardinality at most n*^'^, where Ej is the 
event that I = {1 < i < n : \xi\ > n^'^}. 
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Without loss of generality we can take / = {n' + 1, . . . , n} for some 
n — n°'^ <n'<n. We then observe that 

dist(X,Vr) > dist(7r(X),7r(Vr)) 

where vr : C" — > C" is the orthogonal projection. By conditioning on 
the coordinates x„/+i, . . . , a;„ and making the minor change of replacing 
n with n' (and adjusting c slightly), we may thus reduce to the case 
when / is empty, thus it suffices to show that 



P(dist(X,Vr) < c^n-d\\x^\ < n°-^ for all i) = 0(exp(-n°-°^)). 

Let X be the random variable x conditioned to the event |x| < n^'^, and 
let X = (xi, . . . , Xn) be a vector consisting of iid copies of x. It then 
suffices to show that 

P(dist(X, W) < cVn - d) = 0(exp(-n°-°^)). (33) 

Note that x might have a non-zero mean, but this can be easily dealt 
with by the same trick used before, subtracting Ex from x to make 
X to have zero mean. Since x had variance 1, we see from monotone 
convergence that x has variance 1 — o(l). 



To prove (33), we recall the following inequality of Talagrand. 



Theorem 5.2 (Talagrand's inequality). Let D be the unit disk {z G 
C, \z\ < 1}. For every product probability /i on D", every convex 1- 
Lipschitz function F : C" — i> M, and every r > 0, 

/i(|F - M{F)\ >r)< 4exp(-rV8), 

where M{F) denotes the median of F. 



Proof. This is the complex version of p^ Corollary 4.10], in which D 
was replaced by the unit interval [0, 1]. The proof is the same, with 
a slight modification that implies a worse the constant (1/8 instead of 
1/4) in the exponent. D 

We apply this theorem with /i equal to the distribution of X/n^'^ 
and F : C" — > M equal to the convex 1-Lipschitz function F{v) : = 
dist(f , W), and conclude that 

P(| dist(X, W) - M(dist(X, W))\ > n°-V) < 4exp(-rV8) (34) 

for every r > 0. On the other hand, we can easily compute the second 
moment (cf. [22l Lemma 2.5]): 

Lemma 5.3. We have 

E(dist(X, W^) = (1 - o{l)){n - d). 



28 TERENCE TAG, VAN VU, AND MANJUNATH KRISHNAPUR (APPENDIX) 

Proof. Let vr = {T!'ij)i<ij<n be the orthogonal projection matrix to W. 
Observe that dist{X,Wy = 'Y2=iYTj=i^i'^ij^j- Since the Xi are iid 
with mean zero, we thus have 



E(dist(X,W^)2) = (Ex2)^7r,,. 



i=l 



But X]r=i ^** ~ trace(7r) is equal to n. Since x had variance 1 — o(l), 
the claim follows. D 



Since n — d > n and c < 1, the claim (33) from follows from (34) 



and the above lemma. The proof of Lemma 4.2 is now complete. 



6. Proof of Lemma [4T3] 



We now begin the proof of Lemma 4.3 Fix e, and assume that 6 is 
sufficiently small depending on e. Write n' := [(1 — S)n\ . Observe that 
nr=i ^^^^iuE-^i' ^) ^^ ^^^ n'- dimensional volume of the parallelepiped 
spanned by Xi,...,X„/, which is also equal to det(^A„^„/A* „,)^/^, 
where An,n' is the n' x n matrix with rows Xi, . . . ,X„/. Expressing 
this determinant as the product of singular values, we conclude the 
identity 

1 1 1 "' / 1 

V" logdist(^Xi, Vi) = - V'log -^a,{An,n 

l<j<(l-5)n ^ i=l \V 

Similarly for Yi,Wi, and -B„,„' (the matrix generated by Y'i,...,F„/. 
Thus it suffices to show that with probability 1 — 0(e), one has 



n ^-^ V V ^ 



-Mi{An^n') 1 - log ( ^a,(5„,„0 ) = 0{e) (35) 



for all but finitely many n. We rewrite (35) as 



oo 

\ogt di^n,n'it) = 0{e) (36) 





where dun^n' is the difference of two ESDs: 



n' ^1^ n.n' n' ^^-^ 



B* 



n,n' 



We control (35) by dividing the range of t into several parts. 
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6.1. The region of very large t. We now control the region where 
t > Re for some large Re- 



From Lemma E21 we have that 



1 " 1 1 " 1 

n ' ' , r) r) < ' , hn 



n '■ — ' \^n n '■ — ' \in 



is almost surely bounded, and thus 

t\dVn^n>{t)\ 



'0 

is also almost surely bounded. Thus, with probability 1 — 0(e), we 
have 

Jo 
for all but finitely many n, and some C^ independent of n, which implies 

that 

/■oo 

/ \\ogt\\dur,,n'{t)\<e (37) 

J Re 

for all but finitely many n, and some R^ depending only on e. 



6.2. The region of intermediate t. We now control the region e^ < 

t<Re. 

Lemma 6.3. Let ip he a smooth function which equals 1 on [e^, /?e] and 
is supported on [e^ /2,2R^. Then with probability 1, we have 

oo 

i^{t)\ogtdun,n'{t)=0{e), (38) 



if 5 is sufficiently small depending on e and ip. 



Proof. From the interlacing property (Lemma A.l), we see that 

POO 

ij{t) logtdUn,n'{t) = ij{t) \ogtdVn,n(t) + 0{e) 

Jo 
if 6 is sufficiently small depending on e and ip. 

We now apply the recent result in [3l Theorem 1.1]. For the reader's 



convenience, we restate this result in the Appendix; see Theorem B.l 
This result asserts under the above hypotheses that the ESDs (i/ii^^^* 
and dfii^^Q* converge almost surely to the same limit (in fact, this 
limit is given explicitly in terms of the limiting distribution of fikM m* 
via the inverse Stieltjes transform of ([47j)). In particular, i/„ „ converges 
almost surely to zero, and the claim follows. D 
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Remark 6.4. Note that for the convergence in probabihty case of Propo- 



sition 2^ we need to apply Theorem B.l to a subsequence of n rather 
than to all n, thanks to the subsequence extraction performed at the 
beginning of Section |4} 

6.5. The region of moderately small t. We now control the region 
S'^ < t < e"^. For this we need some bounds on the low singular values 
of An,n' and !?„,„/. 

Lemma 6.6. With probability 1, we have 

1 "' 1 

-y2{ a,iAn,n')r' = 0(1) (39) 

for all but finitely many n, and similarly with A^y replaced by Bny ■ 
Proof. Clearly it suffices to establish the claim for Any. Using Propo- 



sition 5.1| and the Borel-Cantelli lemma, we see that with probability 



1, we have 



1 1 

dist(^Xi,span(Xi,.. . ,Xi_i,Xj+i,. . . ,X„/)) > -V6n 
In 1 



for all but finitely many n, and all 1 < i < n'. The claim then follows 
from Lemma [A. 4[ D 

Since the cri(A„^„/) are decreasing in i, and n! = [(1 — 6)n\, we see 
that the above lemma implies that with probability 1, we have 

= Cr[(l-2<5)nj(A«,n') > c6 



n 

for all but finitely many n, and some absolute constant c > 0. We can 
generalize this lower bound to handle higher singular values also: 

Lemma 6.7. There exists an absolute constant c > such that with 
probability 1, we have 

(TiiAn,n') > C (40) 



for all but finitely many n, and all 1 < i < {1 — 2S)n, and similarly 
with Any replaced by Bny ■ 

Proof. Clearly it suffices to establish the claim for Any- Using Propo- 
sition |5.1| and the Borel-Cantelli lemma, we see that with probability 
1, we have 

diRt('^Xi,span(Xi,...,Xi_i,Xj+i,...,X„//)) > -^/n-n" 
n I 
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for all but finitely many n, and all 1 < i < n" and n/2 < n" < n'. 



Applying Lemma |A.4[ we conclude that we almost surely have 

n 



1 1 

n -i— ^ , /n 'I 



n ^ — ' \/n 



for all but finitely many n, and all n/2 < n" < n' . Using the crude 
bound 

n" 

1 1 

E{—=ai{An,n"))~'^ > {n - n"){^(T2n"-n{An,n"))~'^ 



jn \/n 

we conclude that we almost surely have 



1 ,n — n' 

C2n"-n(^n,n") > C 



// 



n n 



for all but finitely many n, all n/2 < n" < n', and some absolute 
constant d > 0. The claim now follows from the Cauchy interlacing 



property (Lemma A.l). D 



Remark 6.8. If one assumes stronger moment assumptions (e.g sub- 
gaussian) on x, then more precise bounds are known, especially in the 
M„ = case: see [19], [20]. 



From this lemma we can now bound the relevant contribution to (35): 



Lemma 6.9. With probability 1, and if 6 is sufficiently small depending 
on e, we have 



f 



\ogt\\diyn.n'{t)\=0{e) (41) 

for all but finitely many n. 

Proof. By the triangle inequality and symmetry it suffices to show that 
with probability 1, we have 

/ \\ogt\dfii^A ^, (t) = 0{e) 

for all but finitely many n. We rewrite the left-hand side as 

1 "' 1 

n ' ' , hn 



n '■ — ' \/n 



where fit) := |logt|I(5^ < ^^ < £^^)- Since / cannot exceed |log5|, we 
see that the contribution of the case i > (1 — 2(5) n is acceptable if 5 is 
small enough, so it suffices to show that we almost surely have 






n ^ — ' \/n 

l<i<(l~25)n 
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for all but finitely many n. 



By Lemma 6.7, we may assume that n is such that (40) holds. As a 



consequence, we see that the only terms in the above sum which are 
non-vanishing are those for which i = {1 — 0{e'^))n. But then if we 
apply (40 ) and crudely estimate f(t)<— log t we obtain the claim. D 



6.10. The contribution of very small t. Finally, we need to control 
the contribution when t < 6. 

Lemma 6.11. With probability 1, and if 6 is sufficiently small depend- 
ing on e, we have 

|logt||rfz/„,„,(t)|=0(£) (42) 

for all but finitely many n. 



Proof. By arguing as in the proof of Lemma 6.9 , it suffices to show that 
we almost surely have 



1 " 1 

n ^-^ Jn 



for all but finitely many n, where g{t) := | logt|I(t^ < ^^)- 



By Lemmas 6.6, we may assume n is such that (39) holds. On the 
other hand, if 5 is small enough, we have the bound g{t) < et^"^. The 
claim now follows from (39). D 



Putting together (37), (38), (41), (42) we see that with probability 



1 — 0(e), we have (36 ) for all but finitely many n, and the claim follows. 



7. Extensions 



7.1. Proof of Theorem ll.lTL The theorem in the case of almost sure 



convergence follows immediately from Theorem |1.7| by conditioning on 
Mn, SO it remains to verify the theorem in the case of convergence in 
probability. 

Let fix a test function / (as in ([I])) and a positive e. By the bound- 
edness in probability of -^||M||2, we can find a C = C^ such that 
P{Mn eQn)>l-e, where 



(]„ := {M e M„(C) : ^||M 

n 



„..^„^<C^}. 
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Let Ml be the matrix in Vtn which maximizegj the quantity 

P(| j f\z) dfi^^^^,jf_^^^{z) - J f{z) f^/i_i^(M/+y„(^)l > ^)- 
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Applying Theorem 1.7 to the sequence M/ + X„ and Ml + Yn, we see 
that this quantity is o(l). 



Theorem 1.17 follows by integrating over all possible values of M„ 
using the definition of M^, as well as the fact that P(fi„) > 1 — s, and 
then letting e ^ 0. 



7.2. Proof of Theorem 1.18, We first verify the claim for conver- 
gence in probability. 



The condition (i) of Theorem 2.1 is satisfied thanks to the boundedness 
in probability of pi). In order to complete the proof, one needs to check 
(ii). Notice that 



deti^Ar, - zl) = det(^(i^-iM„L„i + X„) - zR-^L-^) det L„i^„. 



The term det LnKn also appears in det(-7^i?„ — zl) and becomes ad- 
ditive (and thus cancels) after taking logarithm. Therefore, one only 
needs to show that 



i log I det [j^iK-'M^L-' + X„) - zK-'L-j) 
-I log I det (^(ir„"iM„L;i + K) - zK;,^L- 
converges in probability to zero. 



One can obtain this by repeating the proof of Proposition |2.2[ The 
slight change here is that zl is replaced by zK~^L:^^, but this has no 
significant impact, except that we need to show 



{K-'M^L-' - zK-'L 



-Ir-lN 



satisfies 



If the maximum is not attained, one can instead choose M^ to be a matrix 
which maximizes this quantity to within a factor of two (say) . 
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1 1 



trace F„F: = —||F„||^ = 0(1) 



almost surely (in order to guarantee (pi)). But this is a consequence of 
the boundedness in probability of (|5]). 

The proof of the almost sure convergence is established similarly, with 
the obvious changes (e.g. replacing boundedness in probability with 
almost sure boundedness). We omit the details. 



8. Proof of Theorem 11.201 
We first prove that (ii) implies (i) for almost sure convergence. Let A^ 



and /i be as in Theorem 1.20 Construct a diagonal matrix B'^ whose 



diagonal entries are independent samples from /z and let B^ := \/nB'^. 



We wish to invoke Theorem 2.1 We first need to verify the almost sure 



boundedness of rt9h. The bound for An follows from Lemma 1.9, and 



the bound for Bn follows from the second moment hypothesis on ^ and 



the (strong) law of large numbers. By Theorem 2.1, the problem now 



reduces to showing that for almost all complex numbers z^ 

- log I det(^/l„ -zl)\-- log I det(^5„ - zl)\ 
n \/n n \/n 



converges almost surely to zero. The right hand side is easy to compute: 

11 1 v^n 1 I \ I 

log I det(^5„ -zl)\ = - log I det(5; -zl)\- ^'=^ °^ ' ' ~ ^' 



n ^n n n 

where A, are iid samples from /x. On the other hand, from Fubini's 
theorem we see that J^, log |w — 2;| dfj,{w) is locally integrable in z, and 
thus 

log \w — z\ dfi{w) < cxo (43) 

c 



for almost every z. If z is such that (43) holds, then by the strong law 
of large numbers, we see that '"^ °^ converges almost surely to 

/^log|w — z\ dfi{w). This shows that (ii) implies (i) for almost sure 
convergence. The proof for convergence in probability is identical and 
is left as an exercise to the reader. 

Now we show that (iii) implies (ii) for almost sure convergence. Let z 



be such that (43) and (iii) hold. To show (ii), it suffices from (11) to 
show that ^ X]j=i logoTj converges almost surely to /^^ log |u7 — z| dfi{w), 
where ctj = crj(-^yl„ — zl) are the singular values of -j^An — zl. On 

the other hand, from (iii) we already know that - XliLi ^'^S V'^i + ^™ 
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converges almost surely to f^ log |u7 — ^1 dniw). Thus it suffices to show 
that 



1 
ij^lo, 



'a^ + en-\ogai (44) 

converges almost surely to zero. 

we know that ^H^nlli is almost surely bounded, 



1.9 



From Lemma 
and so for each z 



~ / ^i = ~\\—r 



Ar,. - Zl\ 



is almost surely bounded also. From this we easily see that 

5Z ^°S \j(^'i + £^n - log (^i 

l<i<n:ai>5n 

converges almost surely to zero for some sequence 6n (depending on 
En) converging sufficiently slowly to zero. To conclude the almost sure 
convergence of (44) to zero, it thus suffices to show that 



n 



l<i<n:ai<5n 



CTi 



converges almost surely to zero. Using Lemma 4J^, we almost surely 
have supjlog ^ < 0(log?2) for all but finitely many n, so it suffices to 
show that 

Yl i°s 

l<i<n-nO-99:o-i<<5„ 

converges almost surely to zero. To do this, it suffices by the union 
bound and the Borel-Cantelli lemma to show that 



n 



(Ti 



P(a„ 



< c- 



n 



Oiexpi-n^-''^)). 



(45) 



for all 1 < z < n — n and some c > independent of n. 



For this we argue as in the proof of Lemma [6. 7[ Fix i. Let A'^ be the 
matrix form by the ffist n — k rows of An — z^fnl with k := 'i/2 and 
c"j, 1 < i < n — fc be the singular values of A'„(in decreasing order, as 



usual). By the interlacing law (Lemma A.l) and re-normalizing 



O'n-i > 



cr„ 



n 



(46) 



By Lemma A. 4 we have that 



a 



1-1 



+ ■ ■ ■ + a. 



1-2 

n—k 



dist]"^ H h dist 



2 

n—ki 
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where distj is the distance from the jth row of A'^ to the subspace 
spanned by the remaining rows. 



As shown in the proof of Lemma 4.2 with probabihty 1 — exp 



-n 



-0.01^ 



distj is bounded from below by D.{y/k) = Q{y/i) for all j. Thus, with 
this probability, the right hand side in the above identity is 0{n/i). 
On the other hand, as the a' are ordered decreasingly, the left hand 
side is at least 



/ ■ I \ /™2 '' /-2 



It follows that with probability 1 — exp{—n 



-0.01^1 



(J^ 



n{ 



n 



This and ( 46 ) complete the proof of ( 45 ) , and so ( 44 ) converges almost 



surely to zero. 



As previously observed, the convergence of (44) to zero shows that (ii 



implies (iii) for almost sure convergence. An inspection of the argument 



shows the convergence of (44) to zero also lets us deduce (iii) from (ii). 



The claim for convergence in probability follows similarly. To conclude 



the proof of Theorem 1.20, it thus suffices to show that (i) implies (ii). 



Again we start with the almost sure convergence case. Assume that 



(i) holds, and let z be such that (43) holds. By shifting A by ^Jnzl if 
necessary we may take z to be zero. Let Ai, . . . , A^ denote the eigen- 
values of -^An- By ( [II] ), it suffices to show that ^ ^?=i log |Aj| con- 
verges almost surely to /^log|w| dfi{w). From (13) we know that 
n Si=i l-^il^ ^^ almost surely bounded. From this and (i) we conclude 
that ^ X]?=i log(l'^il+^) converges almost surely to f^ijog \w\+e) djiiw) 



for any fixed e > 0. Combining this with (43) and dominated con- 
vergence, we see that ^ YTi=i losd-^il + ^n) converges almost surely to 
/^ log \w\ djjiiw) for some sequence £„ > converging sufficiently slowly 
to zero. It thus suffices to show that 



1 " 

-X^log(|Aj| +e„) -log|Aj 



i=i 



converges almost surely to zero. 



UNIVERSALITY OF ESDS AND THE CIRCULAR LAW 37 

By repeating the arguments used to establish the almost sure conver- 



gence of (44) to zero, it suffices to show that 

Y] log ^ 

\<i<n:\\i\<&ri 

converges almost surely to zero. 
Let us order the eigenvalues A, so that |Ai| > ... > |A„|. From Lemma 



4.1 and (45) (and the Borel-Cantelli lemma) we know that we almost 
surely have 

y log — < o(Kiog 



n -^ — ' Oi K 

{1—K)n<i<n 



for all but finitely many n for any fixed < k < 1/2, and hence by 



Weyl's comparison inequality (Lemma A. 3) that we almost surely have 



Yl logTT-T < 0(fi;log-; 

{1—K)n<i<n 



for all but finitely many n also. Since the left-hand side is bounded 
from below 

of the form 



from below by k log n — , we almost surely conclude a lower bound 



|AL(i-«)nj| >«:^^'^ 
for all but finitely many n. In particular (by setting 5 to be a suitable 
power of k) this implies that almost surely 

l<i<ra:|A,|<5 ' *' 

for all but finitely many n for any fixed < 5 <^ 1 and some absolute 
constant c > 0, and the claim follows. The analogous implication for 
convergence in probability is similar. The proof of Theorem 1.20 is now 
complete. 



Appendix A. Linear algebra inequalities 

In this appendix we record some elementary identities and inequalities 
regarding the eigenvalues and singular values of matrices. 

Lemma A.l (Cauchy's interlacing law). Let A be an n x n matrix with 
complex entries and A' be the submatrix formed by the first m := n — k 
rows. Let (Ji{A) > . . . > CniA) > denote the singular values of A, 
and similarly for A' . Then we have 

a,{A) > cTiiA') > a,+k{A) 

for every 1 < i < n — k. 
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Proof. The claim follows easily from the minimax characterization 



and 



ai{A) = sup inf \\Avi\ 



cTi{A') = sup inf \\Avi\ 



of the singular values, where Vi range over z-dimensional complex sub- 
spaces. D 

Lemma A. 2 (Weyl comparison inequality for second moment). Let 
A = {0'ij)i<i,j<n £ Mn{C) have generalized eigenvalues Ai, . . . , A„ G C 
and singular values 0"i(y4) > . . . > 0"„(y4) > 0. Then 



j=l j=l i=l j=l 



dij] 



Proof. The two equalities here are clear, so it suffices to prove the in- 
equality. By the Jordan normal form we can write A = BUB~^ for 
some upper-triangular U and invertible B. By the QR factorization 
we can write B = QR for some orthogonal Q and upper triangular R. 
We conclude that A = QVQ~^ for some upper triangular V. Conju- 
gating by Q, we thus reduce to the case when A is an upper triangular 
matrix, in which case the eigenvalues are simply the diagonal entries 
Oil, . . . , a„„ and the claim is clear. D 

We also have the following (stronger) variant of the above inequality: 

Lemma A. 3 (Weyl comparison inequality for products). Let A = 
{(^ij)i<i,j<n £ Mni^C) have generalized eigenvalues Ai,...,An G C, or- 
dered so that |Ai| < ... < |A„|, and singular values cri{A) > ... > 
o'niA) > 0. Then we have 

f[\X,\<l[a,{A) 
and 

n n 

n^,(^)<ni^^-i 

for all < J < n. 

Proof. It suffices to prove the former claim, as the latter then follows 



from (11). By arguing as in Lemma A. 2 we may assume that A is 
upper triangular, so that the diagonal entries are some permutation of 
Ai, . . . , A„. Consider the symmetric minor A' of A formed by the rows 
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and columns corresponding to the entries Ai, . . . , Aj. The determinant 



of this matrix is then Ai . . . Aj, and thus by (11 ) we have 

na,(A')=niA,i. 

The claim then follows from the Cauchy interlacing inequality (Lemma 



A.l). D 



Now we record a useful identity for the negative second moment of a 
rectangular matrix. 

Lemma A. 4 (Negative second moment). Let 1 < n' < n, and let A be 

a full rank n' x n matrix with singular values cri(A) > • • • > cr„/(A) > 
and rows Xi, . . . ,Xn> € C". For each 1 < i < n' , let Wi be the 
hyperplane generated by the n' — 1 rows Xi, . . . ,Xj_i,Xj+i, . . . ,Xn'. 
Then 



n 



a,{Ar' = J2dist{X„W,)-'. 



j=i i=i 



Proof. Observe that the n' x n' matrix (AA*) ^ has eigenvalues 

al(A)-^...,a„,(A)-l 
Taking traces, we conclude that 



n 



where ei, . . . , e„' is the standard basis of C" . But if Vj := {AA*)'^ej = 
{vj^i, . . . , Vj^n'), then A*Vj = Vj^iXi + . . . + t>j_„/X„/ is orthogonal to 
A*ei = Xi for i ^ j (and thus orthogonal to Wj), and has an inner 
product of 1 with A*ej = Xj. Taking inner products of A*Vj with the 
orthogonal projection of Xj to Wj, we conclude that 

Vjj dist{Xj, WjY = 1. 

Since Vjj = Vj ■ Cj = {AA*)^^ej ■ Cj, the claim follows. D 

Appendix B. A result of Dozier and Silverstein 

Here we reproduce Theorem 1.1 of [3] which we used in the end of 
Section |6l 

Theorem B.l. [3l Theorem 1.1] Let c be a positive constant and x 
be a random variable with variance one. Let X„ be an n x r random 
matrix whose entries are iid copies of x, where r = (c + o(l))n. Let Mn 
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be a random n x r matrix independent from X„ such that the ESD of 
MnM* converges to a limiting distribution H . Define Cn '■= -{Mn + 
Xn){Mn + Xn)* ■ Then the ESD of Cn converges almost surely (and 
hence also in probability) to a limiting distribution F, whose Stieljes 
transform m{z) := f j^dF{X) satisfies the integral equation 

m= [ 'J^ (47) 

J ^-(l + cm)z+(l-c) ^''^ 

for any z & C 

Remark B.2. The theorem still holds if we restrict the size n of the 
matrices to an infinite subsequence ni < n2 < . . . of positive integers. 
One can show this by, for example, artificially filling in the missing 
indices or repeat the proof of Theorem B.l| under this restriction. 



Remark B.3. In (47), H appears, but the actual definition of M„ is 
irrelevant. Thus, one can conclude that if M„ and M^ are such that 
the ESD's of MnM* and M'^M^ tend to the same limit, then the ESDs 
of f^{Mn + Xn){Mn + X„)* and ^(M; + X0(M; + X„)* also tend to 
the same limit. 

Remark B.4. It was mentioned by Speicher [21] and also Krishnapur 



(private communication) that Theorem B.l can be proved using free 



probability, which is different from the approach in [3]. 



Appendix C. Using a Hermitian invariange pringiple 
(by Manjunath Krishnapur) 

The authors have shown invariance principles for ESDs of several non- 
Hermitian matrix models. As in earlier papers, the proof goes through 
Hermitian matrices, but does not need rates of convergence of the Her- 



mitian ESDs, thanks to new ideas such as Lemma 4.2 However, be- 



cause of the use of Theorem |B.1[ it may appear that a limiting result 
for the associated Hermitian matrices is necessary to carry the program 
through. In this appendix, we point out how one may obtain a weak 
invariance principle for ESDs of non-Hermitian matrices by using an 
invariance principle for Hermitian matrices due to Chatterjee |4j, in 



cases where a convergence result such as Theorem |B.l is not available 



As mentioned earlier, other parts of the proof do not require the entries 
are iid. Thus, as a consequence, we can obtain a weak invariance prin- 
ciple for a random matrix model with independent but not identically 
distributed entries. 

We need the following definition from [26l Section 2]. 
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Definition C.l (Controlled second moment). Let k > 1. A complex 
random variable x is said to have n-controlled second moment if one 
has the upper bound 

E|x|^ < K 
(in particular, |Ea;| < k^/^), and the lower bound 

ERe(;zx - wfli\x\ < k) > -Reizf (48) 

for all complex numbers z, w. 

Example. The Bernoulli random variable (P(a; = +1) = P(x = —1) = 



1/2) has 1-controlled second moment. The condition (48) asserts in 
particular that x has variance at least -, but also asserts that a signif- 
icant portion of this variance occurs inside the event |x| < /t, and also 
contains some more technical phase information about the covariance 
matrix of Re(a;) and Im(x). 

Theorem C. 2. Let M„ = f/ij"- ) anc? C„ = ( cr|" ) he constant 

(i.e. deterministic) matrices satisfying 



(1) sup^n ^||M„||2 < oo, 

(n 



(2) a < al] < b for all n, i,j for some < a < b < oo. 



Given a matrix X = (xi j). .^ set 



A„(X) = ^ (M„ + C„ ■ X) = ^ (f4f + a^x 
Jn Jn V '■' '■> 



vari- 



(here "■" denotes Hadamard product). 

Now suppose that x^- are independent complex-valued random 

ables with E[Xj-"- ] = and 'E[\xlj\'^] = 1 and that ylj are independent 
random variables, also having zero mean and unit variance. 

Assume furthermore that both x]^ and y;^ have K-controlled second 
moment for some constant k > 0. 

Assume also Pastur's condition 

^^e[|xSJPI|xJJ| >ev^ — ^0 foralle>0. (49) 

and the same for Y in place of X. Then, 

/iA„(X) - /iA„(Y) -^ 
in the sense of probability. 
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Some remarks. 



(1) If we assume that x]^- are i.i.d. and ?/>" are i.i.d then Pastur's 
condition is obviously satisfied. Further, the condition of k- 
controUed second moment is also not necessary (see the first 
step in the proof sketch) . 

(2) Although the weak invariance principle in the paper uses only 



subsequential limits (see Remark 6.4), it does use Theorem B.l 
to say that subsequential limits are the same for X as for Y. 
Hence we need some changes in the proof in order to establish 



Theorem C.2 , which we do in this appendix. 



(3) This highlights the important new ideas of the paper, such as 



Lemma 4.2, which eliminate the need for rates of convergence 
of ESDs of the Hermitian matrices {An — zI)*{An — zl). This 
is unlike all earlier papers in the subject that followed Bai's 
approach and required such rates (eg., [I], [26], [9], [15]). The 
need for rates made it impossible to use the invariance principle 
for Hermitian matrices as we shall do now. 
(4) Take C„ = J (all ones matrix) and M„ = 0. Then Pastur's 
condition ( [49| implies almost sure convergence of the ESD of 
A„(X)M„(X) (see [21 Theorem 3.9]). For general C„, since 
we use Chatterjee's invariance principle which assumes Pastur's 
condition but only gives weak invariance, we are able to assert 
only weak invariance for the non-Hermitian ESDs also. Thus, 
there is some room for improvement here, namely, to strengthen 



the conclusion of Theorem C.2 to almost sure convergence 



(5) Does ESD of A„(X) converge? Perhaps so, provided the singu- 
lar values of C„ — zl have a limiting measure for every z. In 
[12] we have discussed some easy-to-check sufficient conditions 
on Cn which implies convergence. 



The following lemma is a "Wishart" analogue of the computations in 
section 2 of [1] which considers Wigner matrices. As in that paper, the 
idea is to consider the Stieltjes transform of the ESD of y4„(X)*A„(X) 
as a function of X. However a slight twist is needed as compared to 
Wigner matrices, because the entries of y4„(X)*A„(X) are quadratic in 
X whereas the invariance principle we invoke requires bounds on the 
sup-norm of derivatives of the Stieltjes transform. 



Lemma C.3. Let X and Y he as in Theorem C.2. Let v^ and v^ 
he the ESDs o/ A„,(X)*A„,(X) and A„,(Y)M„(Y). Then vf-^n^^ 
weakly as n —> oo. 
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Proof. Let 



Hn{X.) 





AJX) 



An(X) 





have ESD 9^. The eigenvalues of -ff„(X) are exactly the positive and 
negative square roots of the eigenvalues of An(X.)* An(X.) ■ Thus we 
must show that 9^ — 9^ ^ weakly, in probability. Fix any a in 
the upper half plane and let /(X) := 2^Tr(if„(X) — al)~^. The proof 
is complete if we show that E[/(X)] — E[/(Y)] —>■ for any a with 
Imja} > 0. This can be done by following the same calculations as in 
[3]. It works because the entries of if„(X) are linear in X and hence 
the first partial derivative of Hn with respect to any Xij is a constant 
matrix. One must also use the upper bound on cfij to bound the 
derivatives of /. D 



Remark: Obviously the same conclusion holds for An — zl, just by 
absorbing zl into M„. 



Proof of Theorem C.2. The conditions on M„ and C„ show that the 



first condition of Theorem 2.1 is satisfied (where the two matrices Ar, 



and Bn are now A„(X) and A„(Y)). 



Thus we only need to show an analogue of Proposition 2.2 (only the 
weak part). We sketch the modifications needed. 



(1) Lemma 4.1 



can be proved under independence and ^-controlled 
second moment without i.i.d. assumption (see [SHI Theorem 
2.5]). If we make i.i.d. assumption, then Lemma 4.1 is itself 
applicable, which explains the first remark after the statement 
of the theorem. 



The upper bounds on singular values in (31) are very general 



and hold in our setting for the same reasons. Hence we reduce 



to Lemma 4.2 and Lemma 4.3 as in the paper. 



(2) The high-dimensional contribution (analogue of Lemma 4.2 ) is 
proved almos t th e same way. In the proof of the lower tail bound 

appropriately. In par- 

ower bounds oia?'{n—d) for the second moment 

of dist(X, ly) in Lemma 5.3, and in applying Theorem 5.2 we 

dist(X,iy). 



ticular, we get a 



(Proposition 5.1) use the bounds on cr|j 



get a Lipschitz constant of h for F{X) 



(3) In the low-dimensional contribution (Lemma 4.3), the calcu- 



lations in sections 6.1, 6.5 and 6.10 are exactly as before (in 



section 6^, we use the concentration result already outlined in 
the previous step). 
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(4) That leaves section 6.2, which is the only step that is differ- 



ently handled. Here we apply Lemma |C.3| instead of quoting 
Theorem EH 



D 
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